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Abstract 

The pressing need for efficient compression schemes for XML documents has recently 
been focused on stack computation [TTl |T7] , and in particular calls for a formulation of 
information-lossless stack or pushdown compressors that allows a formal analysis of their 
performance and a more ambitious use of the stack in XML compression, where so far 
it is mainly connected to parsing mechanisms. In this paper we introduce the model of 
pushdown compressor, based on pushdown transducers that compute a single injective 
function while keeping the widest generality regarding stack computation. 

We also consider online compression algorithms that use at most polylogarithmic space 
(plogon). These algorithms correspond to compressors in the data stream model. 

We compare the performance of these two families of compressors with each other and 
with the general purpose Lempel-Ziv algorithm. This comparison is made without any a 
priori assumption on the data's source and considering the asymptotic compression ratio 
for infinite sequences. We prove that in all cases they are incomparable. 

Keywords: compression algorithms, plogon, computational complexity, data stream al- 
gorithms, Lempel-Ziv algorithm, pushdown compression. 

1 Introduction 

The compression algorithms that are required for today massive data applications necessarily 
fall under very limited resource restrictions. In the case of the data stream setting, the 
algorithm receives a stream of elements one-by-one and can only store a brief summary of 
them, in fact the amount of available memory is far below linear [3l [Hj. In the context 
of XML data bases the main limiting factor being document size renders the use of syntax 
directed compression particularly appropriate, i.e. compression centered on the grammar- 
based generation of XML-texts and performed with stack memory |1H I17|. 

In this paper we introduce and formalize useful compression mechanisms that can be 
implemented within low resource-bounds, namely pushdown compressors and polylogarithmic 
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space online compression algorithms. We compare these two with each other and with the 
general purpose Lempel Ziv algorithm [18j. 

Finite state compressors were extensively used and studied before the celebrated result 
of Lempel and Ziv [IH] that their algorithm is asymptotically better than any finite-state 
compressor. However, until recently the natural extension of finite-state to pushdown com- 
pressors has received much less attention, a situation that has changed due to new specialized 
compressors for XML. The work done on stack transducers has been basic and very connected 
to parsing mechanisms. Transducers were initially considered by Ginsburg and Rose in [9] 
for language generation, further corrected in [10], and summarized in [5]. For these models 
the role of nondeterminism is specially useful in the concept of A-rule, that is a transition in 
which a symbol is popped from the stack without reading any input symbol. 

We introduce here the concept of pushdown compressor as the most general stack trans- 
ducer that is compatible with information-lossless compression. We allow the use of A-rules 
while having a deterministic (unambiguous) model. The existence of endmarkers is also al- 
lowed, since it allows the compressor to move away from mere prefix extension. A more 
feasible model will also be considered where the pushdown compressor is required to be in- 
vertible by a pushdown transducer (see Section [3. ip . As mentioned before, stack compression 
is especially adequate for XML-texts and has been extensively used [HI HTj. We will also 
consider an even more restrictive computation model, known as visibly pushdown automata 
lUlTS], on which XML compression can be performed. 

Polylogarithmic space online compressors (plogon) are compression algorithms that use at 
most polylogarithmic memory while accessing the input only once. This type of algorithms 
models the compression that can actually be performed in the setting of data streams, where 
sublinear space bounds and online input access are assumed, with constant and polylogarithm 
being the main bounds [3| I14j. 

For the comparison of different compression mechanisms we consider asymptotic compres- 
sion ratio for infinite sequences, and without any a priori assumption on the data's source. 
Notice that this excludes results that assume a certain probability distribution on the data, for 
instance the fact that under an ergodic source, the Lempel-Ziv compression coincides exactly 
with the entropy of the source with high probability on finite inputs [18] . This last result is 
useful when the data source is known, but it is not informative for arbitrary inputs, i.e. when 
the data source is unknown (notice that an infinite sequence is Lempel-Ziv incompressible 
with probability one). Therefore for the comparison of compression algorithms on general 
sequences, either an experimental or a formal approach is needed, such as that used in |16] . 
In this paper we follow [TB] using a worst case approach, that is, we consider asymptotic 
performance on every infinite sequence. 

We prove that the performance of plogon compressors, pushdown compressors and Lempel- 
Ziv's compression scheme is incomparable in the strongest sense. For each two of these three 
mechanisms we construct a sequence that is compressed optimally in one scheme but is not 
in the other, and vice-versa. In all cases the separation is the strongest possible, i.e. optimal 
compressibility is achieved in the worst case (i.e. almost all prefixes of the sequence are 
optimally compressible), whereas incompressibility is present even in the best case (i.e. only 
finitely many prefixes of the sequence are compressible). 

For the comparison of pushdown transducers with both plogon and Lempel Ziv, we use 
the most general pushdown model (where the pushdown compressor need not be invertible by 
a pushdown transducer) for incompressibility and the more restrictive (where the pushdown 
compressor is required to be invertible by a pushdown transducer) for compressibility, thus 
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obtaining the tightest results. 

The proofs are interesting by themselves, since the witnesses of each of the separations 
proved show the strengths and drawbacks of each of the compression mechanisms. For in- 
stance pushdown compressors cannot take advantage of patterns, while Lempel-Ziv algorithm 
compresses well even non correlative repetitions, and plogon machines require extra informa- 
tion to compress this kind of data. 

This paper contains a revised version of the results in [2] and |21j . 

The paper is organized as follows. Section [2] contains some preliminaries. In section [3l we 
present pushdown compressors and plogon compressor along with some basic properties and 
notations, as well as a review of the Lempel-Ziv (LZ78) algorithm. In section U] we present 
our main results. We end with a brief conclusion on connections and consequences of these 
results for effective dimension and prediction algorithms. 



2 Preliminaries 

Let us fix some notation for strings and languages. Let S be finite alphabet with at least two 
symbols. W.lo.g. we assume that 0, 1 G S. A string is an element of for some integer n 
and a sequence is an element of For a string x, its length is denoted by If x,y are 
strings, we write x < y (called lexicographic order) if |x| < \y\ or |x| = \y\ and x precedes y 
in alphabetical order. The empty string is denoted by A. For S G and i,j G N, we write 
iS'[i..j] for the string consisting of the i^^ through j*^ symbols of S, with the convention that 
S[i..j] = A if z > J, and ^[l] is the leftmost symbol of S. We say string y is a prefix of string 
(sequence) x, denoted y IZ x, if there exists a string (sequence) a such that x = ya. For a 
string X, x~^ denotes x written in reverse order. For a function f : A ^ B, f{x) =_L means 
/ is not defined on input x. For a sum Yl^=i tevm{k) denote a^. For a function /, /(^^ 

denotes / o /. 

Given a sequence S and a function T : S* ^ S*, the T- upper and lower compression 
ratios of S are given by 

Mg)=liminf '^(^[^---"]^ and 

n— »oo n 

„ ^. \T{S[l...n])\ 

/?t(o) = limsup — 



n 



Notation. We use K{w) to denote the standard (plain) Kolmogorov complexity, that is, 
fix a universal Turing Machine U . Then for each string w G S*, 

K{w) = min{|p| |p G {0, 1}*, U{p) = w} 

i.e., K{w) is the size of the shortest binary program that makes U output w. Although some 
authors use C (w) to denote (plain) Kolmogorov complexity, we reserve this notation to denote 
a particular compression algorithm C on input w. 



3 Compressors with low resource-bounds 

In this section we consider several families of lossless compression methods that use very low 
computing resources. We introduce a detailed definition of stack-computable compressors 
together with some variants and review poly-logarithmic space computable compressors and 
the celebrated Lempel-Ziv algorithm. 
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3.1 Pushdown compressors 

We discuss next different formalizations of information lossless compressors that are equipped 
with stack memory. The most general ones are allowed to use a bounded number of lambda- 
rules, that is, stack movements that don't consume an input symbol. The most restricted 
pushdown compressors we consider here are visibly pushdown automata that are suitable for 
XML compression. 

There are several natural variants for the model of pushdown transducer [5| , both allowing 
different degrees of nondeterminism and computing partial (multi) functions by requiring final 
state or empty stack termination conditions. But our purpose here is to compute a total and 
well-defined (single valued) function, therefore nondeterminism should be very limited and 
natural termination conditions are equivalent. 

The main variants that will influence the computing power of a pushdown compressor while 
remaining information lossless are the presence of lambda-rules, the possible restrictions of 
stack movements, and the use of an endmarker, that is an extra symbol signaling the end of 
the finite input. 

We will introduce here pushdown compressors, invertible pushdown compressors, and 
visibly pushdown compressors (this last one defined in [U [15] ) . 

The definitions below are adapted from those in [21 121j . 
Definition. A bounded pushdown compressor (BPDC) is an 8-tuple 

C = {Q,^,r,6,u,qo,zo,c) 

where 

• Q is a finite set of states 

• S is the finite input/output alphabet 

• r is the finite stack alphabet 

• (5 : Q X (S U {A}) X r ^ Q X r* is the transition function 

• z^iQxSxF— >S*is the output function 

• go £ Q is the initial state 

• zq G r is the start stack symbol 

• c € N is an upper bound on the number of A-rules per input symbol. 

We use 5q and dr* for the projections of function 5. We restrict 6 so that zq cannot be 
removed from the stack bottom, that is, for every g G Q, 6 G S U {A}, either 5{q,b,zo) =_L, 
or S{q,b,Zo) = {q',vzo), where q' G Q and v G T*. 

Note that the transition function S accepts A as an input character in addition to elements 
of S, which means that C has the option of not reading an input character while altering the 
stack, such a movement is called a \-rule. In this case 5{q, A, a) = {q', A), that is, we pop the 
top symbol of the stack. To enforce determinism, we require that at least one of the following 
hold for all q G Q and a G T: 

• S{qA,a) =-L, 
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• 6{q, b, a) =_L for all 6 G S. 



We restrict the number of X-rules that can be applied as follows: between the input symbols 
in positions n and n + 1 a maximum of c A-rules can be applied. 

Wc first consider the transition function S as having inputs in Q x (SU {A}) x r+, meaning 
that only the top symbol of the stack is relevant. Then we use the extended transition function 
(^* : Q X S* X r+ ^ Q X r*, defined recursively as follows. For q e Q, v e r+, w G S*, and 
6 e S 

S*(q X v) =1 ^*^^Q^'i' "^r* ^))' if 

\ (^jv), otherwise. 

{^*{^Qi^Q(.Q^ v),b, 6^, {q, w, v)), A, dr* (5^(9, w, v),b, 6^, {q, w, v))), 
if 5*{q,w,v) ^± and 6{6Q{q,w,v),b,6^*{q,w,v)) 
±, otherwise. 

That is, A-rules are implicit in the definition of 6*. We abbreviate 6* to 6, and d{qo,w, zq) 
to (5(u;). We define the output from state q on input u; G S* with z G F* on the top of the 
stack by the recursion i'{q, A, z) = A, 

2/(0-, wb, z) = i>{q, w, z) u{SQ{q, w, z), b, 6t* {q, w, z)). 

The output of the compressor C on input it; G S* is the string C{w) = ^{qo, w, zo). 

The input of an information-lossless compressor can be reconstructed from the output and 
the final state reached on that input. 

Definition. A BPDC C = (Q, S, F, ^, i^, go, -^O; c) is information-lossless (XL) if the function 

E* ^ E* X Q 
w {C{w),Sq{w)) 

is one-to-one. An information-lossless pushdown compressor (ILPDC) is a BPDC that is IL. 

Intuitively, a BPDC compresses a string w if |C(w)| is significantly less than \w\. Of 
course, if C is IL, then not all strings can be compressed. Our interest here is in the degree 
(if any) to which the prefixes of a given sequence S G can be compressed by an ILPDC. 

We will also consider PDC that have endmarkers, a characteristic that can achieve a better 
compression rate. 

Definition. An information-lossless pushdown compressor with endmarkers (ILPDCwE) is 
a BPDC C = (Q, S U {$}, F, 6, qo,zo,c) with input alphabet E U {$} ($ E) such that the 
function 

E* ^ E* X Q 
w ^ iC{w$),6Q{w)) 

is one-to-one. 

Notice that the use of endmarkers can improve compression. In particular each ILPDC 
is a particular case of ILPDC with endmarkers, but there are ILPDC with endmarkers that 
perform better than usual ILPDC. 
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We will denote as pushdown compression ratio the concept corresponding to the most 
general family of pushdown compressors, those that use endmarkers. 

Notation. The best-case pushdown compression ratio of a sequence S £ is ppd{S) = 
mi{pc{S) I C is an ILPDCwE}. 

The worst-case pushdown compression ratio of a sequence S G is Rpd{S) = \ni{Rc{S) \ 
C is an ILPDCwE}. 

Notice that so far we have not required that the computation should be invertible by an- 
other pushdown transducer, which is a natural requirement for practical compression schemes. 
The standard PD compression model does not guarantee the decompression to be feasible and 
it is currently not known whether the exponential time brute force inversion can even be im- 
proved to polynomial time. To guarantee both decompression and compression to be feasible, 
we require the existence of a PD machine that given the compressed string (and the final 
state), outputs the decompressed one. This yields two PD compression schemes, the stan- 
dard one (PD) and invertible PD. Contrary to Finite State computation, it is not known 
whether both are equivalent. This is by no means a limitation, since all results in this paper 
are always stated in the strongest form, i.e. we obtain results of the form "X beats PD" and 
"invertible PD beats X". 

Here is the definition of invertible PD compressors. We want this definition to be the 
most restrictive one and therefore regular ILPDC. 

Definition. (C, D) is an invertible PD compressor (denoted invPD) if C is an ILPDC and 
D is a PD transducer s.t. D{C{w), 6q{w)) = w, i.e. D, given both C{w) and the final state, 
outputs w. 

Notation. The best-case invertible pushdown compression ratio of a sequence S S T,°° is 
PinvPoiS) = 'mf{pc{S) I C is an invPD}. 

The worst-case invertible pushdown compression ratio of a sequence S G is i?invPD('S') = 
mi{Rc{S) I C is an invPD}. 

We end this section with the concept of visibly pushdown automata from [U [15] that is 
extensively used in the compression of XML. 

A visibly pushdown compressor (visiblyPD) is an information-lossless pushdown compres- 
sor for which the input alphabet has three types of symbols, call symbols, return symbols, 
and internal symbols. The main restriction is that while reading a call, the automaton must 
push one symbol, while reading a return symbol, it must pop one symbol (if the stack is 
non-empty), and while reading an internal symbol, it can only update its control state. 

Therefore the compression ratio attained by visibly pushdown automata is an upper bound 
on the compression ratio attained through the pushdown compressors defined above. 

3.2 plogon compressors 

We introduce the family of compressors that can be computed online with at most poly- 
logarithmic space. Notice that these resource bounds correspond to those of the data stream 
model [H , where the input size is massive in comparison with the available memory, and 
the input can only be read once. 

Definition. (Hartmanis, Immerman, Mahaney [12]) A Turing machine M is a plogon trans- 
ducer if it has the following properties, for each input string w 

• the computation of M{w) reads its input from left to right (no turning back), 

• M{w) is given \w\ written in binary (on a special tape). 
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• M{w) writes the output from left to right on a write-only output tape, 

• M{w) uses memory bounded by log(|t(;|)'^, for a constant c. 
We denote with plogon the class of plogon transducers. 

Note that contrary to Finite State transducers, a plogon transducer is not necessarily a 
mere extender, i.e., there is a plogon transducer M and strings w, x such that M{wx) 73 M{w). 
Definition. A plogon transducer C : S* — > S* is an information lossless compressor (ILplog) 
if it is 1-1. 

Notation. The best-case plogon compression ratio of a sequence S G T,°° is Ppiogon('S') = 
mf{pc{S) I C is an ILplog}. 

The worst-case plogon compression ratio of a sequence S G is i?piogon(5') = mi{Rc{S) \ 
C is an ILplog}. 

3.3 Lempel Ziv compression scheme 

Let us give a brief description of the classical LZ78 algorithm [181. Given an input x £ S*, LZ 
parses x in different phrases Xj, i.e., x = xiX2 . . . Xn {xi G S*) such that every prefix y C Xi, 
appears before Xi in the parsing (i.e. there exists j < i s.t. xj = y). Therefore for every i, 
Xi = XK^i-^hi for l{i) < i and 6j G E. We sometimes denote the number of phrases in the parsing 
of X as P{x). After step i of the algorithm, the i first phrases xi, . . . ,Xi have been parsed 
and stored in the so-called dictionary. Thus, each step adds one word to the dictionary. 

LZ encodes Xi by a prefix free encoding of l{i) and the symbol 6.j, that is, if x = xiX2 . . .Xn 
as before, the output of LZ on input x is 

LZ{x) = Q(i)6iQ(2)62 • • • Q(n)&n 

where Cj is a prefix- free coding of i (and xq = X). 

For a string z = xy we denote by LZ{y\x) the output of LZ on y after having read x 
already. 

LZ is usually restricted to the binary alphabet, but the description above is valid for any 
alphabet S. 

4 The performances of the LZ78 algorithm, plogon compres- 
sors and pushdown compressors are incomparable 

In this section we prove that the two families of compressors we have introduced, pushdown 
and plogon compressors, and the Lempel Ziv compression scheme, are all incomparable. That 
is, for any pair among those three, there are different individual sequences on which one is 
outperformed by the other and vice versa. In all cases we get low worst-case rate (p) for 
one method versus high best-case rate (R) for the other, i.e. the widest possible separation 
between them. 

4.1 Lempel Ziv beats Pushdown compression 

Our first result shows that there is a sequence that our most general family of pushdown 
compressors cannot compress and that is optimally compressible by Lempel Ziv. 
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The proof is based on two intuitions, that require a careful analysis. The first one is that 
from a few Kolmogorov-random strings a much longer pushdown-incompressible string can 
be constructed. On the other hand, a sequence with enough (and non-consecutive) repeated 
substrings can be compressed optimally by Lempel-Ziv. 

Theorem 4.1 There exists a sequence S such that 

Rlz{S) = 

and 

Ppd{S) = 1. 

Proof. Consider the sequence S = S1S2 ■ ■ ■ where Sn is constructed as follows. Let x = 
x\X2 ■ ■ ■ x^i = ra) be a Kolmogorov-random string with K{x) > log Let 

~ ^ii ■ ■ ■ ^ii 

where ij G {1, . . . , n^} for every I < j < I are indexes, defined later on. Let 



n 

/ = ^^femin(|S|^r^f+l) 

so that 



n 
k=l 



=n/ = ^fcmin(|S|^n^+l). (1) 

k=l 

Let us show that for every e > and for n large enough 

n^~' < \Sn\ < n^- (2) 

We prove the first inequality. 

n 

\Sn\ = k mindEl*^, n~^^) < nterm(n) < n ■ n ■ n~'^^ = nP. 
k=l 

For the second inequality we have 



= ^A;min(|S|^nf+^) 
k=l 

n 

> J2 'tmin(|S|^nl^+^) 

fc=(l-f)n 

> ^term((l - |)n) 

> n^-^ 

Let Ci, C2, . . . be an enumeration of all ILPDCwE such that can be encoded in at most 
i bits and such that a maximum of log^^^ i A-rules can be applied per symbol. The following 
claim shows that there are many C-incompressible strings Xi. 
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Claim 4.2 Let F„ = {Ci, . . . ,Ciog„}. Let w £ T.* . 

1. Let C £ Fn- There are at least (1 — fTHgn)'^^ strings xi (I < i < ri^) such that 

\C{wxi)\ - \C{w)\ > n - 2Vn. 

2. There is a string Xj such that for every C £ Fn, 

\C{wxi)\ - \C{w)\ > n - 2Vn. 

Proof of Claim 14.21 After having read w, C is in state q, with stack content yz, where y 
denotes the n log^^^ n topmost symbols of the stack (if the stack is shorter then y is the whole 
stack). It is clear that while reading an Xj, C will not pop the stack below y. 

Let T = (1 - and let C{q, yz, Xi$) denote the output of C when started in state 

q on input Xi$ with stack content yz. Suppose the claim false, i.e. there exist more than 
V? — T words Xi such that C{q,yz, Xi$) = pi, ends in state qi, and \pi\ < n — + 0(1) 
(notice that the output on symbol $ is 0(1))- Denote by G the set of such strings Xj. This 
yields the following short program for x (coded with alphabet S): 

p = (n, C, g, y, 01*102*2 • • • a„2t„2) 

where each comma costs less than 3 log | s| , where s is the element between two commas; Oj = 1 
implies = Xj, Oj = implies Xi £ G and ti = d{qi)01d{\pi\)01pi (where d{z) for any string 
z, is the string written with every symbol doubled), i.e. \ti\ < n — ^Jn. p is a program for 
x: once n is known, each Ojtj yields either Xj (if Oj = 1) or {pi,qi) (if o, = 0). From ipi,qi), 
simulating G{q,yz,u$) for each n € yields the unique u = Xi such that C{q,yz,u$) = pi 
and ends in state qi. The simulations are possible, because G does not read its stack further 
than y, which is given. We have 

\p\ < O(logn) + n log^^'^ n + {n + 1)T + {n^ - T){n - ^/n) 

^2.5 



2 log n 



^2.5 



41ogn 



which contradicts the randomness of x, thus proving part 1. 

Let Wj be the set of strings x, that are compressible by Gj] by 1., \Wj\ < n^/21ogn. Let 
R = {xi}f^i — U^°^^Wj be the set of strings incompressible by all G G F„. We have 

\R\ > — logn • n^/21ogn = n^/2 > 1. 

This proves part 2. □ 
We finish the definition of Sn by picking Xj^ to be the first string fulfilling the second part 
of Claim [^^2] for w = S1S2 . ■ ■ Sn-i- The construction is similar for all strings {xj^. j^^gi by 
taking w = S1S2 ■ . . Sn-iXi^ . . . Xi^_j, thus ending the construction of Sn- 
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Let us show that ppd{S) = 1. Let e > 0. Let C = be an ILPDCwE; then for almost 
every n, and for ah < t < \Sn\/'n, < i < n we have 

\C{Si...Sn-iSn[l...tn + i]$)\ 



\Si... Sn-lSn[l . . .tn + i] 



^ ZUi^ - ^Vj)\Sj\/j + t{n - - 0(1) 



> 1 



Y.]Zt\Sj\ + {t + l)n 



EU I^^-I + + 1)" E l^.l + + l)n E •=! + (i + l)n 



v^n-l -4.5 

> 1 - e/4 - 0(1) (by Equation 



> 1 - e/2 



15-5 

0(l)(n — 1) term(n — 

§term(|) 
0(l)(n- l)(n- 1)4-5 



>l-^/2 

3V3'' 

> 1 - e/2 - e/2 = 1 - e (choosing 5 = 0.1) 

i.e. ppd{S) = 1. 

We show that Rlz{S) = 0. Suppose LZ has already parsed input Si . . . and has dn 

words in its dictionary {dn < nlS^I). Let P be the parsing of 5^ by LZ, let tp be the size of 
the largest string in P and let 1 < /c < tp. Let us compute the maximum number of strings 
of size k in P. Any string u of size /c in a parsing of Sn is of the form 

u = xt, [t...n]xt^...xt^^^ 

i.e. amounts to choose k/n strings x^. and the position 1 < t < n where u starts in xt^. 
Therefore there are at most = n ■ (n^)^/" = n^+'^^li^ such words u of size k. 

Let Pw be the worst-case parsing of S„, that starts on an empty dictionary and parses all 
possible strings of size k in Sn (for every k < t^), where is the size of the largest string in 
Pu, i.e., min(|S|"^, n"'^'''^/") strings of size one are parsed, followed by min(|Sp, n-^^^/") strings 
of size 2, . . . , followed by mm{\T,\^ ,n^^'^^^^) strings of size k, and so on. Because 



^A:min(|S|^nf +^) = \Sn\ 



k=l 



we have <n. 

Let p (resp. p^) be the number of phrases in P (resp. P-w)- We have p < pw, and 
\LZ{Sn\Si...Sn-i)\ <p\og{p + dn)- Since 



Pw = min(|S|'^, n " < nterm(n) 



k=l 

we have 

|LZ(5„|5i...5„_i)| <n4log(n4 + n|5„|) <n4+" 
where a > can be arbitrary small. 
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Let < t < \Sn\/n, < i < n. We have 



|LZ(5i . . . Sn-iSn[l ...tn + i])\ ^ E"=i \LZiS,\Si . . . + . . . 



|Si...S„„i5„[l...tn + i]| - E?=i|5i 



< — I - J 

2^7 = 1 Pjl 2^7 = 1 Pjl 



< e/2 + e/2 < e 

i.e. Rlz{S) = 0. □ 



4.2 Lempel Ziv beats plogon compressors 

The Lempel Ziv algorithm can also surpass plogon compressors. Our second comparison 
detects sequences on which Lempel-Ziv achieves optimal compression whereas a plogon com- 
pressor has the worst possible performance. The construction is based on repetition of Kol- 
mogorov random strings. We show that Lempel-Ziv works well on any repeated pattern, 
whereas in polylogarithmic space big patterns cannot be stored. 

Theorem 4.3 There exists a sequence S such that 

Rlz{S) = and p^\ogon{S) = 1. 

The proof will use the following general property that bounds the output of Lempel-Ziv 
on strings of the form w = vT" . 

Lemma 4.4 Let n N and let u £ Y,* , where u ^ X. Define I = l + \u\ and w = . Consider 
the execution of Lempel-Ziv on w starting from a dictionary containing d > phrases. Then 
we have that 

\LZ{w)\ < y^2l\w\ log{d + y^2l\w\) (3) 

Proof of Lemma 14. 4L Let us fix n and consider the execution of Lempel-Ziv algorithm on 
w: as it parses the word, it enlarges its dictionary of phrases. Fix an integer k and let us 
bound the number of new words of size k in the dictionary. As the algorithm parses \u\, the 
number of different words of size k in u" is at most \u\ (at most one beginning at each symbol 
of u). Therefore we obtain a total of at most \u\ different new words of size k in w. This total 
is bounded from above by / = |n| + 1. 

Therefore at the end of the algorithm and for all k, the dictionary contains at most / new 
words of size k. We can now bound from above the size of the compressed image of w. Let 
p be the number of new phrases in the parsing made by Lempel-Ziv algorithm. The size of 
the compression is then p\og{p + d): indeed, the encoding of each phrase consists in a new 
symbol and a pointer towards one of the p + d words of the dictionary. The only remaining 
step is thus to evaluate the number p of new words in the dictionary. 

Let us order the words of the dictionary by increasing length and call ti the total length 
of the first / words (that is, the / smallest words), t2 the total length of the / following words 
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(that is, words of index between I + 1 and 21 in the order), and so on: is the sum of the 
size of the words with index between (k — 1)/ + 1 and kl. Since the sum of the size of all these 
words is equal to {wl, we have 

\w\ = ^ifc- 

k>l 

Furthermore, since for each k there are at most / new words of size k, the words taken into 
account in all have size at least k: hence tk > kl. Thus we obtain 

p/l 2 
fc>l k=l 



Hence p satisfies 



— <\w\, that is, p < '\/2l\w\. 



The size of the compression of w is plog{p + d) < y/2l\w\ log{d + y/2l\w\), which ends the 
proof of Lemma 14.41 

□ 

Proof of Theorem 14.31 Let vl, c G N with c > 7. For each i G N, let Ri be a Kolmogorov 
random string with \Ri\ = i (i.e. K{Ri) > Hog — ^ for ^ the constant just fixed). Let 

On — IX\IX2 n,^ • ■ ■ -tt^ 

{Rn" means n'^ copies of i?„) and let S be the infinite sequence having all Sn as prefixes. 
The following three lemmas will analyze the performance of Lempel Ziv on all prefixes of 

S. 

Lemma 4.5 

\LZ{Sn)\ ^ 
\Sn\ - n-+^ 

for n large enough. 

Proof of Lemma 14.51 Denote by LZ{i\i — 1) the output of LZ on Rf , after having parsed 
5i_-i already. 

Using the notation of Lemma 14.4^ let w = R] ; thus 1 = 1 + \Ri\ = 1 + and d < \Si-i\ < 
{i - 1)^+2. Thus 



\LZ{i\i - 1)1 < V2(i + log((i - 1)^+2 ^ ^2{i + < i^^+^^Z^ 

for i large enough (i > Nq). Thus for n sufficiently large 

n 

\LZ{Sn)\=Y.\LZ{j\j-l)\ 

= ^ \LZ{j\j-l)\+Y, \LZ{j\j-l)\ 

j=l j=No 
<n + n-n(^+3)/2 <^(c+6)/2 

for n large enough, which ends the proof of Lemma 14.51 □ 
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Lemma 4.6 Let Sn,t = RiRf Rf ■ ■ ■ Rf R^+i where 1 < t < {n + If. Then 

\LZiSn,t)\ ^ n(^+7)/2 



c+l 



for n large enough. 

Proof of Lemma 14.61 

Using Lemma 14.51 we have 

\LZ{Sn,t)\ = \LZ{Sn)\ + \LZ{Ri_,^\Sn)\ 
<n(^+^y' + \LZ{Ri^,\Sn)\ 

Applying Lemma with w = R^^i, d < |5„| < n'^+^, / = n + 2, = t{n + 1) yields (for n 
large enough) 

|LZ«+i|S„)| < V2t(n + l)(n + 2)log(n'=+2 + y^2t{n + l){n + 2)) 

Whence 

\LZiSn,t)\ ^ + ^ n(^+7)/2 

\Sn,t\ ~ ~ n^+i 

which ends the proof of Lemma 14.61 □ 

Lemma 4.7 For almost every k, < /j(-i+9/(c+3))/2 ^ j^^, ^ > 7 R^^i^S) = 

0. 

Proof of Lemma US Let /c e N and let n, t, / (0 < ^ < n, < t < (n + 1)^) be such 
that S[l...k] = SnRi+iRn+i[^ ---l]- On R„+i[l .../], LZ outputs at most nog(5[l ...k]) = 
O(nlogn) symbols. Since A; < (n + 1)^+^ < rf^'^, Lemma 14.61 yields 

\LZ{S[l...k])\ ^ n(-+7)/2+0(^log^) ^ ^ ^(-l+9/{c+3))/2 

□ 

Let us show that the sequence S is not compressible by ILplogs. For this we show that 
each large substring x of the input that is a Kolmogorov random word cannot be compressed 
by a plogon transducer, independently of the computation performed before processing x. 

Let C be an ILplog. For strings a, /3, x with z = axjd and \z\ = m, denote by C{s, x, m) 
the output of C starting in configuration s and reading x out of an input of length m. A 
valid configuration, is a configuration s such that there exists a string c such that C{so,c,m) 
ends in configuration s, where sq is the start configuration of C. For example if s is the 
configuration of C after reading a, then C{s, x, m) is the output of C while reading part x of 
input z = axh. Note that \s\ < log(m)*^(^). 

Lemma 4.8 Let C he an ILplog, running in space log" m, and letO<T< 1. Then for every 
d G N and almost every r E N, for every random string x £ T,'^ (with K{x) > T\x\ log |S| — j4 
for some fixed constant A), for every M with \x\ < M < \x\'^ and for every valid configuration 
s (\s\ < log^M) 

\C{s,x,M)\ >T\x\ -log2"|x|. 
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Proof of Lemma 14. 81 Suppose by contradiction that C(s, x, M) = p, with \p\ < Tr — log^"' r; 
denote by the configuration of C after having read x starting in s. Then p' = (s^, s, M, r,p) 
{p' is encoded by doubling all symbols in , s, M,r, separated by the delimiter 01 followed 
by p) yields a program for x (coded with alphabet S): 

"Find y with \y\ = r such that C(s,y,M) = p, and C ends in configuration after reading 
2/-" 

y is unique because otherwise suppose there are two strings y,y' {\y\ = \y'\) such that 
C{s, y, M) = C{s, y', M), and C ends in the same configuration on y and y' . Let 6 be a string 
that brings C into configuration s. Then for z = l*^~l*?^l we have C{byz) = Ciby'z) which 
contradicts C being 1-1. Therefore y is unique, i.e. y = x. Thus for r sufficiently large 

\p'\ < 2{\s''\ + |s| + |M| + |r|) + \p\ < lilog" r'^ + log" r'^ + log r'^ + log r) +Tr -log^^r 

<^^_log^ 
2 

which contradicts the randomness of x. □ 

Lemma 4.9 Let C he an ILplog, running in space log^m. Then for every e > and for 
almost every m, > I - e i.e., /9piogon('S') = 1- 

Proof of Lemma 14.91 Let e > and let e' = ^,^c+i ■ Let n, t, Z (0 < / < n, < t < n^) be 

such that ^[l . . .m] = Sn-lRnRn[^ . . .1]. 

The idea is to apply Lemma 14.81 to R^^^^ ■ ■ ■ rII-i ^ ^n^n[l . ■ - l]- Let d be such that 
(e'n)'^ > n^^"^ (for all n > 2), i.e. (e'n)'^ > m. By Lemma |4.8| C on input S[l...m], will 
output at least j — log^" j symbols on each Rj {e'n < j < n). Therefore 

n-l 

\C{S[l...m])\ > ^ (j - log^- j)f + t{n- log'- n) 

j=e'n 

whence 

\CiS[l ...m])\ ^ EU'nfU - l"g'" i) + - l"g'" ^) > - "j) + tiri - log'" n) 



i:"=iJ"+^ + (i + l)^ " E"=i'j"+^ + + 

(1 - «)(E ■=e'^n J"+^ + {t + 1)^) (1 - a)n 

" (1 + «0(Ei=i' J"+^ + it + 1)^) TTjZl + {t + l)n 

where a, a' > can be chosen arbitrarily small (for n large enough). Let a,a' > be such 
that ^ > 1 - e/2. Thus 



\C(S[l...m])\ I- a I- a f^^ ,, I- a 



e n 



m 



> e/4 > e/4 

- 1 + a' 1 + a' ^^Ji^i'^+i -1 + a' n/3(n/3)'=+i ' 



1 — a 
~ 1 -q' 
> 1-e . 



- e/4 > 1 - e/2 - e/4 - e/4 



Since e is arbitrary, /5piogon('S') = 1- 1^ 
This finishes the proof of Theorem 14.31 □ 
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4.3 Invertible pushdown beats plogon compressors 

In this section we take the most restrictive classes of pushdown compressors, namely invertible 
pushdown automata and visibly pushdown automata, and show that they both outperform 
plogon compressors. 

The proof is based on using a list of Kolmogorov random strings together with their 
reverses to construct the sequence witnessing the separation. A careful choice of the length 
of these random strings makes the result incompressible by plogon devices. 

Theorem 4.10 For each e > there exists a sequence S such that 

-RinvPD(S') < 1/2 and Ppiogon(S') > 1 - e. 

Proof. Let ei, e2 > and let G N to be determined later (as k > 4/e2)- 

We first notice that for each m G N there is a string y G S* with \y\ = km and such that 
y[ik + + l)k] 7^ l'' for every i and K{y) > ^^\y\ log This can be proved by a simple 
counting argument. 

rlog_n T 

Let i„ = /c ' i°g * , so that 

n<tn< nk. (4) 

For each n G N let y„ G S'^*" be as above {yn[ik + + l)k] / 1^' for every i and K{yn) > 
¥l2/"|log|S|). 

Consider the sequence S = yil^y^ ^y2^^y2 ^ ■ ■ ■Un^'^yn^ We will refer to the 1^ sepa- 
rators as flags. Consider the following invertible pushdown compressor {C,D). Informally on 
both yj and flag zones, C outputs the input. On a yj^ zone, C outputs a zero for every 1/ei 
symbols, and checks using the stack that the input is indeed yj^. If the test fails, C outputs 
an error flag, enters an error state, and from then on it outputs the input. 

The complete definitions of C and D arc given for the sake of completeness. Let A > l/f-i 
with A = k^ for some a G N, i.e. guaranteeing that A\ for almost every n. The set of 
states Q is: 

• the start state Qq 

• the counting states ql,...,ql and qo, with b = k 22j=i {^tj + 1) 

• the flag checking states . . . , q^^ and , . • • > ' 

• the pop flag states qg, . . . jqj. 

• the compress states qf, . . . , q'x+i 

• the error state q^. 

We now describe the transition function 6 : Q x 'E* x "E* ^ Q x T,* . At first C counts from 
to g|. This guarantees that for later y^, ^| For < z < 6 — 1 let 

^{(ii,x,y) = {qt+i,y) 

and 

<^(56,A,2/) = {qo,y). 
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After counting has taken place, a new y zone starts; the input is pushed to the stack, and 
it is checked for the flag, by groups of k symbols. 



and for 1 < i < — 1 



5{ql\x,y) = {qfl-^^xy) 



[Qi ,xy) if X = 1 
iq(°,xy) if X / 1 



/o 



S{<l{\x,y) 



{4+1^ xy) ifa:=l 
iQ{li,xy) ifx/1 
If the flag has not been detected after k symbols, the test starts again. 

6{ql°,\y) = {qo,y). 

If the flag has been detected the pop flag state is entered 

%{\A,y) = (g5,y). 

Since the flag has been pushed to the stack it has to be removed, thus for < i < — 1 

S{qL\y) = {qi,y)- 

C then checks using the stack that the input is indeed yj^ , counting modulo A. If the test 
fails, an error state is entered, thus for 1 < i < ^ 



S{qi,x,y) = < 



(gf+i,A) iix = y 

{q'',y) if X / y and y / zo 

(gf ,xzo) if X = 1, y = zo 

{q(",xzo) if 7^ 1, y = Zo 



Once A symbols have been checked, the test starts again 

%A+i,A,2/) = (qly). 

The error state is a loop, S{q^,x,y) = {q'^,y)- 

We next describe the output function : Q X S* X S* — ^ S*. First on the counting states, 
the input is output, i.e., for < i < 6 — 1 

^{Qi,x,y) = X. 

On the flag states the input is output, thus for 1 < z < A; — 1, a G {0, 1} 

T^{qt,x,y) = X. 

There is no output on popping states gj, . . . , and on compressing states gj, . . . , q\^i except 
after A symbols have been checked i.e. 



i^{Q%x,y) = if X = y 



16 



On error, l*Ox is output, i.e. for 1 < i < A 

I'iQi, X, y) = l*Ox if X 7^ y and y zq. 

On the error state, the input is output, that is, u[q^,x,y) = x. 

Let us verify C is IL, that is, the input can be recovered from the output and the final 
state. If the final state is not an error state, then both all y^-'s and all flags are output as in 
the input. If the final state is qf then the number t of zeroes after the last flag (in the output), 
together with the final state qf determines that the last y~^ zone is + i — 1 symbols long. 

If the final state is an error state, then the output is of the form (suppose the error 
happened in the yj^ zone) 

ayjl^Oh'Ob 

with a,b G S*. The input is uniquely determined to be the input corresponding to output 
ayjl'^O* with final state q^ followed by 

yj\tA + l..tA + i-l]b. 

We give the definition of the inverter D. The set of states Q' is: 



the 


start state q^ 




the 


counting states qf, ■ ■ ■ ,ql,qo, with b 




the 


flag checking states q(^ , . . . , qj^ and 




the 


pop flag states ^q, . . . , 




the 


decompress states for u G S-^ 




the 


copy states q^ for u G S-"^ 




the 


output state q° 





D receives as input a string followed by a state qj e Q. Let us describe the transition function 
S' : Q' xT,* xT,* ^ Q' X S* and the output function z/' : Q' x S* x S* ^ S* in parallel. At 
first D counts from q^ to q^, i.e., for < i < 6 — 1 let 

S'{qt,x,y) = {qUi,y) 

and 

s'iqb^^^y) = (90, y)- 

On the counting states, the input is output, i.e., for < i < 5 — 1 

'^'iQi,x,y) =x. 

At first the input is pushed to the stack, and it is checked for the flag, by groups of k 
symbols. 

S'{qQ X y) = l^^i''^^) if X = 1 
\{q{°,xy) ifx/1 
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and for 1 < z < A; — 1 

S'{q{°,x,y) = {q!l„xy) 
If the flag has not been detected after k symbols, the test starts again. 

s'{qtKy) = {qo,y). 

If the flag has been detected the pop flag state is entered 

s'{qi\x,y) = {q'o,y)- 

Since the flag has been pushed to the stack it has to be removed, thus for < i < — 1 

S'{ql,X,y) = {ql+iA) 

S'iql,\y) = {qiy) 

On the flag states the input is output, i.e. for 1 < i < — 1, a G {0, 1} 

v'{qt,x,y) = X, 

^'{qo,x,y) = X. 

There is no output on popping states Qq, . . . , g^. 

The decompressing states pop and memorize A symbols of the stack 

S'{qtX,y) = (qiy,^) for \u\ < A. 
If \u\ = A then, depending on the next symbol, should be output 

d'{qtAy) = iqiy) iiyT^zo. 

5'{qt,0,zo) = iqo,zo). 
u'{qiO,y)=u-\ 

If 1 is found then there is an error 

6'{qtl,y) = {q:,y). 

S'{qZ,i,y) = {q:,y). 
u'{qZ,l,y) = b. 

S'{qZAy) = {q°,y). 

If the next symbol is a state then the y^^ zone was not complete 

i^'{qtq^,y)=u-'[l..i-l]. 

Once the error has been passed, D stays in the output state. S'{q°,x,y) = {q°,y), 
^'{q°,x,y) = X. 
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This ends the description of (C, D). 

Let us compute the compression ratio of C. For n large enough and since the counting part 
on the first b symbols of S is of constant size, it is negligible for computing the compression 
ratio, therefore we can assume wlog that C starts compressing immediately, i.e. 5 = 0; 
moreover the ratio is largest just after a flag 1^ whence 

\C{yil''y^'y2l%\..ynl')\ < Ml + ^i) E"=i + " ^ifcin 



\yilky-^y2l''y^' . . . ynl'^l 2fc E"=i tj + nk - ktn 



1 + n/2 tn/2 



En— i , sr^n—i- J. 
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n nk 
< l/2 + ei/2 + — + 



n(n — 1) n(n — 1) 
< 1/2 + ei/2 + ei/4 + ei/4 = 1/2 + ei 

for n sufficiently large. Since ei is arbitrary 

i?mvPD(5) < 1/2. 

We now compute the compression ratio of a plogon compressor on S. Let m G N and let 
n S N be such that 

S[l...m]= y^l^y^^y2lS2^ • ■ ■ {Vnl'^yn^l • • • i] 

with 1 < i < k{l + 2tn)- Let C be an ILplog, running in space log*^ m. Let e' = e2/8k. 
Applying Lemma 14.81 with d = 3 and r ranging e'n < r < n (such that r < m < for n 
sufficiently large), we have that for every j £ {e'n, . . . ,n} 

\C{s,y'^,m)\>T\yj\-log"'{\yj\) 

where 6 = ±1. Letting Sj (resp. s'j) {j £ {e'n, . . . denote the configuration of C reached 
on input ^[l . . . m] just before reading the first symbol of yj (resp. yj^), we have 

n— 1 n— 1 



\C{S[l...m])\ > ^ \C{s„yj,m)\+Y, \C{s'^,yT\ 

j=t'n j=e'n 
n-1 

>2 ^(r|y,|-log2"|y,|) 

j=€'n 
n-1 

>2Y, (r|y,-|-7|y,l) 

j=e'n 

n-1 

= 2(r-7) \y^\ 

j=e'n 

with 7 > arbitrary close to 0, for n large enough. Choosing 7 and T = such that 
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T - 7 > 1 - £2/4 (taking /c > 4/e2) yields 



\C{S[l...m])\ ^2(r-7)E; 
|5[l...m]| 



n-l 



nk + 2 YTj=l ^tj 



>(r-7)-(r-7)[ 



n/2 



>(r-7)-(r-7)[- 



n 



+ 



+ 



2kn 



+ 



e'n-l , 
1 ''ii 



'n^n — 1) n{n — 1) 

> 1 - £2/4 - €2/4 - 62/4 - £2/4 

> 1 - €2 



+ 



ke'n{e'n — 1) ^ 
n(n — 1) 



for n sufficiently large, and 

Pplogon(5') > 1 — £2- 

□ 

Even visibly pushdown automata, extensively used in the compression of XML, can beat 
plogon compressors. The definition of visibly pushdown automata can be found in section 
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Theorem 4.11 There exists a sequence S such that 

-RvisiblyPD(5') < 1/2 and Pp\ogon{S) > 1 — • 

Proof. 

The proof is a variation of the proof of Theorem l4.10[ If the alphabet S has 2t symbols, this 
time the sequence used is 5 = yiY^^y2Y,f^ . . . y-nXn'^ ■ ■ ■, where yi are Kolmogorov random 
strings over the first t symbols of the alphabet, and Yi is the string obtained from yi by 
changing each symbol a by symbol a + t, that is, Yi contains only the last t symbols of the 
alphabet. □ 



4.4 Lempel-Ziv is not universal for Pushdown compressors 

It is well known that LZ [18] yields a lower bound on the finite-state compression of a sequence 
|18j . i.e., LZ is universal for finite-state compressors. 

The following result shows that this is not true for pushdown compression, in a strong 
sense: we construct a sequence S that is infinitely often incompressible by LZ, but that has 
almost everywhere pushdown compression ratio less than ^. 

Theorem 4.12 For every e > 0, there is a sequence S such that 

-RinvPD(5') < - 

and 

PLz{S) > 1 - e. 



20 



Proof. Let e > 0, and let k = k{e),v = v{e),v' = v'{e) be integers to be determined later. 
For any integer n, let T„ denote the set of strings x of size n such that V does not appear 
in X, for every j > k. Since T„ contains S'^"^ x {0} x S'^"^ x {0} . . . (i.e. the set of strings 
whose every fcth symbol is zero), it follows that |T„| > ISI"", where a = 1 — 1/k. 

Remark 4.13 For every string x ^ Tn there is a string y S Tn~i and a symbol b such that 
yb = X. 

Let An = {ai, . . . a^} be the set of palindromes in T„. Since fixing the n/2 first symbols 
of a palindrome (wlog n is even) completely determines it, it follows that \An\ < |S| 2". Let 
us separate the remaining strings in T„ — An into v pairs of sets Xn^i = • • ■ Xi^t} and 

Yn,i = {vi^i, . . .yi,t} with t = '^"2"/"' , (a^ij)""^ = Vij for every I < j < t and 1 < i < v, 
Xi,i,yi,t start with a zero. For convenience we write Xi for Xn^i- 

We construct 5 in stages. Let f{k) = 2k and f{n + 1) = f{n) + v + 1. Clearly 

> f{n) > n. 

For n < fc — 1, S'„ is an enumeration of all strings of size n in lexicographical order. For n > k, 

yi,t ■ ■ ■ yi,i X2,i . . . X2,t ~ ' 

Xy^i . ■ ■Xv^d^^'"-'^^'"y^^t ■ --Vv,! 



Sn = ai...au . . . Xi,t l^(")+l yi^t • • • yi,l X2,l • • • X2,t 1^^"^' " y2,t ■ ■ ■ y2,i 



i.e. a concatenation of all strings in An (the A zone of Sn) followed by a flag of /(n) ones, 
followed by the concatenations of all strings in the Xi zones and Yi zones, separated by flags 
of increasing length. Note that the Yi zone is exactly the Xi zone written in reverse order. 
Let 

D = ^1^2 • • • "Jfc-l i i • • • -L okJk+1--- 

i.e. the concatenation of the Sj^s with some extra flags between 5^-1 and Sk- We claim that 
the parsing of Sn {n > k) by LZ, is as follows: 

ai,...,au, . . . yi,t,...,yi,i,...,x„,i,...,Xt;,t,l-^("')+'',y„,t,...,yt,,i. 

Indeed after 5i, . . . S^-i l'^ i^+i- _ ^ ^ l^^^-i^ LZ has parsed every string of size < k — 1 and the 
flags 1^ . . . 1^'^^^. Together with Remark 14. 13^ this guarantees that LZ parses Sn into 

phrases that are exactly all the strings in T„ and the v + 1 flags l-^^"^ . . . , 

Let us compute the compression ratio plz{S)- Let n,i be integers. By construction of 5, 
LZ encodes every phrase in Si (except flags), by a phrase in plus one symbol. Indexing 
a phrase in Si-i requires a codeword of length at least logarithmic in the number of phrases 
parsed before, i.e. log(P(S'iS'2 . . . Si-2))- Since P{Si) > |rj| > it follows that for almost 

every i 

|y|a(i~l) |y|<^ 

p{Si . . . 5,_2) > YI i^r' = isia_i ^ 6|sr('-i) 

where the inequality holds because a < 1 (hence the denominator is less than 1). Letting 
ti = \Ti\, the number of symbols output by LZ on Si is at least 

PiSi) log p{Si . . . 5,_2) > ti log 5|sr(*-i) 

> cti{i - 1) 
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where c = c(a) can be made arbitrarily close to 1, by choosing a accordingly. Therefore 

n 

|LZ(5i...S„)| > J]ct,(i-1) 



Since 



|5i . . . Sn\ = \Si... Sk-ll ...l\ + \Sk...Sn\< in^" + Y^ijtj + iv + + V)) 

j=k 

and \LZ{Si . . . Sn)\ > Yl^=k ^^jU " 1)) the compression ratio is given by 

p^M^l ■ ■ ■ ^nj _ C|^|3, ^ ^ ^ ^ 

_ + EU^Jh + + tjjj - 1)) 

_ |sp + E-=fefe + (^ + i)(/(i)+^)) 

The second term in this equation can be made arbitrarily small for n large enough: Let 
k < M < n/3, we have 



n M n 

j=k j=k j=M+l 



M n n 

j=k j=M+l j=M+l 

M n n 

j=k j=M+l j=M+l 

M n 

>Y.jt,+M t, + isr 

j=k j=M+l 



We have 

M n 

\Lr > Mm^^ + ^ t . + + 1) Y^{f[j) + v)] 

j=k j=k 

for n large enough, because f{j) < j^. Hence 

in'" + EUi^j + iv + +v)) _ + E]=kih + + + ^) 



< c- 



+ E •=fe(ii.- + + + ^)) - M[|E|3^ + E^=fe(t, + (t' + i)(/(j) + v))] M 

i.e. 
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which by definition of c, M can be made arbitrarily close to 1 by choosing k accordingly, i.e 

PLz{Si ...Sn)>l-e. 

Let us show that Rp£){S) < ^. Consider the following ILPD compressor C. First C 
outputs its input until it reaches zone Sk- Then on any of the zones A,Xi and the flags, C 
outputs them symbol by symbol; on Yi zones, C outputs one zero for every v' symbols of 
input. To recognize a flag: as soon as C has read k ones, it knows it has reached a flag. For 
the stack: C on Sn cruises through the A zone up to the first flag, then starts pushing the 
whole Xi zone onto its stack until it hits the second flag. On Yi, C outputs a for every 
v' symbols of input, pops one symbol from the stack for every symbol of input, and cruises 
through v' counting states, until the stack is empty (i.e. X2 starts). C keeps doing the same 
for each pair Xi,Yi for every 2 < i < v. Therefore at any time, the number of symbols of Yi 
read so far is equal to v' times the number of symbols output on the Yi zone plus the index of 
the current counting state. On the Yi zones, C checks that every symbol of Yi is equal to the 
symbol it pops from the stack; if the test fails, C enters an error state, outputs an error flag 
and thereafter outputs every symbol it reads (this guarantees IL on sequences different from 
S). This together with the fact that the Yi zone is exactly the Xi zone written in reverse 
order, guarantees that C is IL. Before giving a detailed construction of C, we compute the 
upper bound it yields on RpoiS). 

Remark 4.14 For any j £ N, letpj = C{S[1 . . . j]) be the output ofC after reading j symbols 
of S. Is it easy to see that the ratio ^gl^^^j-^^ is maximal at the end of a flag following an Xi 
zone, since the flag is followed by a Yi zone, on which C outputs one symbol for every v' input 
symbols. 

Let < I < V. We compute the ratio inside zone Sn on the last symbol of the flag 

following At this location (denoted j'o), C has output 

n— 1 . ^ 

\Pjo\ < \n'' + Eb'l^il + + + ^) + - + -)] + n\An\ + {v + l)(/(n) + v) 

j=k 
j=k 

where p > can be made arbitrarily close to ^ for n large enough. 
The number of symbols of S at this point is 

n— 1 ^ 

\S[1 ...jo]\> J^ilT.-l + n\An\ + ^\Tn-An\{I + -) 

j=k 

>j:m+^\Tn\{i+\) 

j=k 
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Hence by Remark 14.141 

\Pn\ ^ + E"=fe [11^.1(1 + ^)] + i^\Tn\{I + 1 + ^) 

lim sup -j-— TT < iim sup 



\s[i...n]\ ^-ij-|r,.| + n|r„|(/ + i) 

n-l 



limsup[ , + 



+ f |T„|(I + i) 2 E-=fe'i|T,| + f |r„|(/ + i) 



2^'E-=fcj|r,| + ^|r„|(/ + i) 2^ E-=Mr,| + f|T„|(/ + i)^ 

Since E"=fc il^jl > - ^)\Tn-i\ > (n - 1)^, we have 

1—1 1 

^ j|r,| + ^|T„|(/ + -) > ^\T^\ + y^^iii + -) 



j=k 



Therefore 



Hm sup — -— j — < hm sup , , 



< hm sup -— — = 



and 



which is arbitrarily small by choosing v' accordingly, and 



which is arbitrarily small by choosing v accordingly. Thus 

Rpd{S) = lim sup j^"^ < 

n->oo \S[l...n\\ 2 

For the sake of completeness we give a detailed description of C. Let Q be the following 
set of states: 

• The start state qo, and qi, . . . the "early" states that will count up to 

w = \SiS2...Sk^i l'' 1'=+^ ...l^'^-ij. 

• qQ,...,q^ the A zone states that cruise through the A zone up to the first flag. 

• qj the jth flag state, {j = 1, . . . ,v + 1) 
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• Qo \ ■ ■ ■ :Qk' -^j zone states that cruise through the Xj zone, pushing every symbol 
on the stack, until the {j + l)-th flag is met, {j = 1, . . . ,v). 

Y Y- 

• . . . ,qj the Yj zone states that cruise through the Yj zone, popping an symbol 
from the stack (per input symbol) and comparing it to the input symbol, until the stack 
is empty, {j = l,...,v). 

• Qq-' , . . . ,0^-' which after the jth flag is detected, pop k symbols from the stack that 
were erroneously pushed while reading the jth. flag, {j = 2, . . . ,v + 1). 

• Qe, Qe' the error states, if one symbol of Yi is not equal to the content of the stack. 

We next describe the transition function ^ : Q x S* x S* — Q x S*. First S counts up to w 
i.e. ioT i = 0, . . . w — 1 

S{qi, X, y) = {qi+i, y) for any x, y 
and after reading w symbols, it enters in the first A zone state, i.e. for any x,y 

S{qw,x,y) = {qo,y). 

Then 6 skips through A until the string 1*^ is met, i.e. for i = 0, ... A; — 1 and any x, y 



Kqi :X,y) 

and 



{qf+i,y) ifx = l 
{q^,y) if X 7^1 



^{qi^x^y) = {q{^y)- 

Once \^ has been seen, 6 knows the first fiag has started, so it skips through the flag until a 
zero is met, i.e. for every x,y 



5{qi^x,y) 



{q{,y) ifx = l 
(g^SOy) ifx = 



where state q^^ means that the first symbol of the Xi zone (a zero symbol) has been read, 
therefore 5 pushes a zero. In the Xi zone, delta pushes every symbol it sees until it reads a 
sequence of k ones, i.e up to the start of the second flag, i.e for z = 0, . . . /c — 1 and any x, y 

X, ^ ,x _ j(.qf+\^^y) if a; = 1 



s{qt\x,y) 

and 



(^0^' ,xy) if X / 1 



S{qk \x,y) = (go' ,y). 

At this point, 6 has pushed all the Xi zone on the stack, followed by k ones. The next step 
is to pop k ones, i.e for i = 0, ... fc — 1 and any x, y 

S{ql'^,x,y) = (g[+i,A) 

and 

Siqk^,x,y) = {qi^y)- 
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At this stage, 6 is still in the second flag (the second flag is always bigger than 2k) therefore 
it keeps on reading ones until a zero (the flrst symbol of the Y zone) is met. For any x, y 



5{q^,x,y) 



{ql\X) ifx = 0. 



On the last step, 6 has read the flrst symbol of the Yi zone, therefore it pops it. At this stage, 
the stack exactly contains the Xi zone written in reverse order (except the flrst symbol), 6 
thus uses its stack to check that what follows is really the Yi zone. If it is not the case, it 
enters Qe- While cruising through Yi, 6 counts with period v' . Thus for i = 1, . . . v' — 1 and 

any x, y 

Yi 



K(lJ\x,y) 

and 



(gi+i,A) ■\ix = y 
{qe, A) otherwise 



{ql^ , A) if X = y 
{qe,X) otherwise 

Once the stack is empty, the X2 zone begins. Thus, for any x,y, 1 < i < v' 



S{ql\x,zo) 



(<7f^lzo) ifx = l 
{q^\Ozo) ifx = 0. 



Then for 2 < j < v the states corresponding to the Xj and Yj zones behave similarly 

(that is, states qf\ ql'^^^ , Qj+i, and gp). 

At the end of 1^, a new A zone starts, thus for any 1 < i < v' 



{Qi,zo) if X = 1 
(QoiZq) if X = 0. 



Once in the qe state the compressor outputs a flag then enters state q^', from that point 
it simply outputs the input, thus 

5{qe,X,X) = {qe',X) 

and 



S{qe',x,y) = {qe',y) 

tl 

{j = 1, . . . ,v) where for 1 < i < v' 
and 

and qe where a flag is output i.e., 

i/(ge,A,A) = 10. 

Finally, with a similar construction as in the proof of Theorem 14.101 the inverse of C can 
be computed by a pushdown compressor, showing that C is invPD. 

□ 



Y Y 

The output function outputs the input on every state, except on states qi\---,qj 

^{Qj',b,y) = A 
'^iQy>,b,y) = 
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4.5 plogon beats Lempel Ziv 

Our next result uses a Copeland-Erdos sequence [6| l7j on which Lempel-Ziv has maximal 
compression ratio, whereas with logspace each prefix of the sequence can be completely re- 
constructed from its length. 

Theorem 4.15 There exists a sequence S such that 

^plogon (-S*) = and plz{S) = 1. 

Proof. Let S = E{Ti*) be the enumeration of strings over E in the standard lexicographical 
order. LZ does not compress 5 at all, for this algorithm it is the worst possible case, i.e. 

PLz{S) = 1. 

For any input with \w\ = n, let m G N, x E S* be such that w = S[l . . .m]x, and 
^[l . . . m + 1] w. Then we define compressor C as C{w, \w\) = dbin(m)01x, where dbin(m) 
is m written in binary with every bit doubled (such that the separator 01 can be recognized). 
C is clearly 1-1. C is plogon, because on input {w,n), C reads the input online to check that 
w is a prefix of S (i.e. the standard enumeration of strings over S); the biggest string to 
check has size log n, therefore the check can be done in plogon. As soon as the check fails, C 
outputs the length (in binary, with every bit doubled) of the prefix of the input that satisfied 
the check (at most 21ogn bits) followed by 01 and the rest of the input. 

The worst case compression ratio for sequence S is given by 

|g(g[l...n],n)| 21ogn 
xtpiogonl'-' j = hmsup = limsup = 0. 

n— >oo n n— >cx> n 

□ 

4.6 plogon beats Pushdown compressors 

The next result shows that plogon compressors outperform our most general family of push- 
down compressors on certain sequences. 

The proof is an extension of the intuition in Theorem 14. 11 from a few Kolmogorov-random 
strings a much longer pushdown-incompressible string can be constructed, even if an iden- 
tifying index for each string is included. The index can then be used by a polylogarithmic 
compressor to compress optimally the sequence. 

Theorem 4.16 There exists a sequence S such that 

^plogon (5*) = and ppb{S) = 1. 

Proof. Consider the sequence S = S1S2 ■ ■ ■ where Sn is constructed as follows. Let x = 
X1X2 ■ ■ ■ Xn2 {\xi\ = n) be a random string with K{x) > log Let 

Sn = X1X2 . ■ ■ Xn2ilXi^i2Xi2 ■ . . i2^Xi^„ 

where ij S {1, . . . n^} for every 1 < j < 2" are indexes coded in 21ogn bits, defined later on. 

Let Ci, C2, ... be an enumeration of all ILPDCwE such that Cj can be encoded in at most 
i bits and such that a maximum of log^^^ i A-rules can be applied per symbol. 

The following claim shows that there are many C-incompressible strings Xj. 
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Claim 4.17 Let F„ = {Ci, . . . ,Ciog„}. Let w £ S*. 

1. Let C £ Fn. There are at least (1 — 2iogra )^^ strings ixi (I < i < ri^) such that 

\C{wixi)\ - \C{w)\ >n-2^. 

2. There is a string Xj such that for every C £ Fn, 

\C{wixi)\ - \C{w)\ >n- 2^^. 

Proof of Claim I4.17L After having read w, C is in state q, with stack content yz, where y 
denotes the (n + 2 log n) log*-^^ n topmost symbols of the stack (if the stack is shorter then y 
is the whole stack). It is clear that while reading an ixi, C will not pop the stack below y. 

Let T = (1 — 2iogn )^^' y^: ixi$) denote the output of C when started in state 

q on input ixi$ with stack content yz. Suppose the claim false, i.e. there exist more than 

— T words ixi such that C{q,yz,ixi$) = pi, ends in state qi, and \pi\ <n — 2y^ + 0(1) 
(notice that the output on symbol $ is 0(1))- Denote by G the set of such strings Xj. This 
yields the following short program for x (coded with alphabet S): 

p = (n, C, g, y, 01*102*2 • • • arfit^^) 

where each comma costs less than Slog |s|, where s is the element between two commas; Oj = 1 
implies ti = Xi, Oj = implies Xi £ G and tj = d{qi)01d{\pi\)01pi (where d{z) for any string z, 
is the string written with every symbol doubled), i.e. \ti\ < n — \fn. p is a program for x: once 
n is known, each Ojtj yields either x, (if Oj = 1) or (pj, qi) (if o, = 0). From (pj, Qj), simulating 
C{q,yz,u$) for each u G Y,n+2iogn yjgijjg unique u = ixi such that C{q,yz,u$) = pi and 
ends in state qi. The simulations are possible, because G does not read its stack further than 
y, which is given. We have 

\p\ < 0(log n) + (n + 2 log n) log^^^ n + (n + l)T + (n^ -T){n- ^/n) 



2 log n 



^2.5 



41ogn 



which contradicts the randomness of x, thus proving part 1. 

Let Wj be the set of strings ixi that are compressible by Gj] by 1., \Wj\ < n^/21ogn. Let 
R = {ixj}"^^ — u'°f"Tyj be the set of strings incompressible by all G G F„. We have 

\R\ > - logn • n^/21ogn = n^/2 > 1. 

This proves part 2. □ 
We finish the definition of Sn by picking iiXj^ to be the first string fulfilling the second 

part of Claim StT] for w = S1S2 ■ . . Sn-i- The construction is similar for all strings {xj^-}|^2) 

by taking w = S1S2 ■ . . Sn~iXi^ . . . Xi^_^, thus ending the construction of Sn- 

Let us show that /9pd(5') = 1- Let e > 0. Let C = be an ILPDCwE; then for almost 

every n, and for all < * < 2", because jSi . . . Sn-i\ is exponentially larger than the first 
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XiS of zone Sn, it is good enough to compute the compression ratio only after those first 
Xj's and after each ixi. We have 

|C(gi...5,,_i5;,[»^ + /(/-/ + 21og ».)]$)! 

\Si...Sn-iSn[n^ + t{n + 2logn)]\ 
^ E"=fc(2^)(j-2v^) + ^(n-2v^) 

~ E"=i' if + 2^' (j + 2 log j)) + n2 + i (n + 2 log n) 

^ (1 - «) E"=i j^' + + tn 2(t + 1) (1 - a) E ■=! j2^' 

~ (1 + «) E"=i' i2J- + n2 + j2j + + in E"=i' ^2^ + + in 

> l-e/4-e/4-e/4> 1-e 

where a can be made arbitrarily small for large enough n. 

We show that -Rpiogon('S') = 0. Consider the following plogon compressor C, where every 
output bit is output doubled except commas (coded by 10) and the error flag (coded by 01). 
First C outputs the length of the input (in binary) followed by a comma. For the n'^ first Xi's 
of zone Sn, C outputs them (and stores them). For the remaining ijXi^^s, only ij is output, 
and C checks that what follows ij is indeed Xi^. If at any point in time the test fails, the 
error mode is entered. In error mode, 01 is output, followed by the rest of the input, starting 
right after the ij where the error occurred. 

It is easy to check that C is polylog space, since at the beginning of zone Sn, the available 
space is of order poly(n). 

C is IL, because from C's output, we know the length of the input and whether the error 
mode has been entered or not. If there is no error, all the first Xj's of zone Sn can be 
recovered, followed by all strings ijXi- . If the error mode is entered, by the previous argument 
the sequence Sn can be reconstructed up to the last ij before the error. The rest of the output 
yields the rest of the sequence. 

Let us compute the compression ratio. Let e > 0. Let n G N and < i < 2". Because 
l^i is exponentially larger than the first XiS of zone Sn, it is good enough to 

compute the compression ratio only after those first Xj's. We have 

\C{Si . . . Sn-iSn[n^ + i(n + 2 logn)])| ^ f + 2^ (2 log j) + + 2i log n] 

\Si... Sn-iSn[n^ + t{n + 21ogn)]| - P + 2^(j + 21ogi) + + t{n + 21ogn) 

2[E"=i^ 3 • 2^ log j + n^ + 2t log n] 
~ E"=i'i2i+n3 + in 

eEPi 2^1ogi] 
< ^^'-^ -, — + e/4 + e/4 

" e;=ij2^ 



Since log j < ^ j for all j > jo we have 

\C{S,...Sn-iSn[n^ + t{n + 2lognM ^ 6Ef=i 2^' log j] e/4[Z]ll+^ 
|5i...5„_i5„[n3 + i(n + 21ogn)]| " ^^Z^ j2J Ep'j2^' 

< e/4 + e/4 + e/2 < e. 

□ 
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5 Conclusion 



The equivalence of compression ratio, effective dimension, and log-loss unpredictability has 
been explored in different settings [HI 1131 120j . It is known that for the cases of finite-state, 
polynomial-space, recursive, and constructive resource-bounds, natural definitions of com- 
pression and dimension coincide, both in the case of infinitely often compression, related 
to effective versions of Hausdorff dimension, and that of almost everywhere compression, 
matched with packing dimension. The general matter of transformation of compressors in 
predictors and vice versa is widely studied [22]. 

In this paper we have done a complete comparison of pushdown, plogon compression and 
LZ-compression. It is straightforward to construct a prediction algorithm based on Lempel- 
Ziv compressor that uses similar computing resources, and it has been proved in [IJ that 
bounded-pushdown compression and dimension coincide. This leaves us with the natural 
open question of whether each plogon compressor can be transformed into a plogon prediction 
algorithm, for which the log-loss unpredictability coincides with the compression ratio of the 
initial compressor, that is, whether the natural concept of plogon dimension coincides with 
plogon compressibility. A positive answer would get plogon computation closer to pushdown 
devices, and a negative one would make it closer to polynomial-time algorithms, for which 
the answer is likely to be negative [19j . 

References 

[1] P. Albert, E. Mayordomo, and P. Moser. Bounded pushdown dimension vs lempel ziv 
information density. Technical Report TR07-051, ECCC: Electronic Coloquium on Com- 
putational Complexity, 2007. 

[2] P. Albert, E. Mayordomo, P. Moser, and S. Perifel. Pushdown compression. In Proceed- 
ings of the 25th Symposium on Theoretical Aspects of Computer Science (STACS 2008), 
pages 39-48, 2008. 

[3] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the fre- 
quency moments. Journal of Computer and System Sciences, 58:137-147, 1999. 

[4] R. Alur and P. Madhusudan. Adding nesting structure to words. In Proceedings of the 
Tenth International Conference on Developments in Language Theory, volume 4036 of 
Lecture Notes in Computer Science. Springer, 2006. 

[5] J. Autebert, J. Berstel, and L. Boasson. Context-free languages and pushdown automata. 
In G. Rozenberg and A. Salomaa, editors. Handbook of Formal Languages, volume 1, 
Word, Language, Grammar, pages 111-174. Springer- Verlag, 1997. 

[6] D. G. Champernowne. Construction of decimals normal in the scale of ten. J. London 
Math. Soc, 2(8):254-260, 1933. 

[7] A.H. Copeland and P. Erdos. Note on normal numbers. Bulletin of the American 
Mathematical Society, 52:857-860, 1946. 

[8] J. J. Dai, J. I. Lathrop, J. H. Lutz, and E. Mayordomo. Finite-state dimension. Theo- 
retical Computer Science, 310:1-33, 2004. 



30 



[9] S. Ginsburg and G. F. Rose. Preservation of languages by transducers. Information and 
Control, 9(2):153-176, 1966. 



[10] S. Ginsburg and G. F. Rose. A note on preservation of languages by transducers. Infor- 
mation and Control, 12(5/6):549-552, 1968. 

[11] S. Hariharan and P. Shankar. Evaluating the role of context in syntax directed compres- 
sion of xml documents. In Proceedings of the 2006 IEEE Data Compression Conference 
(DCC 2006), page 453, 2006. 

[12] J. Hartmanis, N. Immerman, and S. Mahaney. One-way log-tape reductions. In Proceed- 
ings of the 19th Annual Symposium on Foundations of Computer Science (FOCS'78), 
pages 65-72. IEEE Computer Society, 1978. 

[13] J. M. Hitchcock. Effective Fractal Dimension: Foundations and Applications. PhD thesis, 
Iowa State University, 2003. 

[14] P. Indyk and D.P. Woodruff. Optimal approximations of the frequency moments of data 
streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing 
(STOC 2005), pages 202-208. ACM, 2005. 

[15] V. Kuma, P. Madhusudan, and M. Viswanathan. Visibly pushdown automata for stream- 
ing xml. In International World Wide Web Conference WWW 2007, pages 1053-1062, 
2007. 

[16] J. I. Lathrop and M. J. Strauss. A universal upper bound on the performance of the 
Lempel-Ziv algorithm on maliciously-constructed data. In B. Carpentieri, editor. Com- 
pression and Complexity of Sequences '97, pages 123-135. IEEE Computer Society Press, 
1998. 

[17] C. League and K. Eng. Type-based compression of xml data. In Proceedings of the 2007 
IEEE Data Compression Conference (DCC 2007), pages 272-282, 2007. 

[18] A. Lempel and J. Ziv. Compression of individual sequences via variable rate coding. 
IEEE Transaction on Information Theory, 24:530-536, 1978. 

[19] M. Lopez- Valdes and E. Mayordomo. Dimension is compression. In Proceedings of 
the 30th International Symposium on Mathematical Foundations of Computer Science, 
volume 3618 of Lecture Notes in Computer Science, pages 676-685. Springer- Verlag, 
2005. 

[20] E. Mayordomo. Effective fractal dimension in algorithmic information theory. In New 
Computational Paradigms: Changing Conceptions of What is Computable, pages 259- 
285. Springer- Verlag, 2008. 

[21] E. Mayordomo and P. Moser. polylog space compression is incomparable with lempel- 

ziv and pushdown compression. In Proceedings of the 35th International Conference on 
Current Trends in Theory and Practice of Computer Science (SOFSEM09), volume 5404, 
pages 633-644. Springer Lecture Notes in Computer Science, 2009. 



31 



[22] D. Sculley and C. E. Brodley. Compression and machine learning: A new perspective on 
feature space vectors. In Proceedings of the Data Compression Conference (DCC-2006), 
pages 332-341, 2006. 



32 



